Navigating networks with limited information 
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We study navigation with limited information in networks and demonstrate that many real-world 
networks have a structure which can be described as favoring communication at short distance at the 
cost of constraining communication at long distance. This feature, which is robust and more evident 
with limited than with complete information, reflects both topological and possibly functional design 
characteristics. For example, the characteristics of the networks studied derived from a city and 
from the Internet are manifested through modular network designs. We also observe that directed 
navigation in typical networks requires remarkably little information on the level of individual nodes. 
By studying navigation, or specific signaling, we take a complementary approach to the common 
studies of information transfer devoted to broadcasting of information in studies of virus spreading 
and the like. 

PACS numbers: 89.75.Hc, 89.75.Fb, 89.70.-fc 



I. INTRODUCTION 

The study of networks is one possible way to address 
the relative importance or ease of communication abil- 
ity in complex systems 0, 0- I n this context a large 
effort has been devoted to the non specific broadcasting 
that dominates for example the Internet in the form of 
spam mail or computer viruses 0, Q . Here we instead 
focus on specific signaling since it has been suggested 
that that sending a signal to one specific node without 
disturbing the remaining network is a possible candidate 
for a design principle in real- world networks [a]. By in- 
troducing the search information, we have addressed this 
in a general framework in 0, Q and in relation to urban 
organization in |5j- The philosophy of specific commu- 
nication between a source node s and a target node t is 
that s can only send one signal that subsequently has 
to be directed to the desired target node t. In princi- 
ple, for a connected complex network any target t can be 
reached from any other node s, but distant communica- 
tion is obviously neither as easy nor as accurate as close 
direct communication 0. In particular, for the social 
networks studied by it was observed that knowledge 
of other people's activities declined exponentially with 
their separation in a network and increased linearly with 
the number of degenerate paths between them. 

To capture this observation, we here use walks in 
networks. The simplest walker is the random walker, 
which has earlier been used to characterize topological 
features of networks 0, , including first-passage times 
[To|. large-scale modular features [lj, and search utiliz- 
ing topological features [l2( • Using a simple extension of 
a random walker, we here discuss navigation in complex 
networks. 



We consider a random walker that represents the prop- 
agating signal released from s. Its probability to reach 
node t before getting lost on some nondirect and thus 



{path} 1 lj£path(s,i) 



1/kj. Here 
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nonspecific path is V oc J2{ 
the sum captures the linear gain in alternative shortest 
paths and the product represents the exponential decline 
in probability as distance increases, thus reflecting the 
overall functional dependence observed in In real 
networks the decline of signals over a node may be faster 
than 1/fe, representing the possibility that signals are 
lost. In our approach we neglect such losses and turn the 
probability to reach a specific node to the minimal infor- 
mation / = log 2 ('P) necessary to travel directly between 
the two nodes. Thereby, 0, |g characterized networks in 
terms of the minimal information needed to send walkers 
between specific nodes. 

In this paper we implement the specific signaling by 
letting a walker move from source s to target t and make 
choices of exit links along the walk. This choice is at ev- 
ery node associated to a node information cost that goes 
up with the degree of the node. For a single correct exit 
among k edges, the minimal information cost would be 
log 2 (fe) bits. One could easily imagine a higher cost, but 
we here limit our investigations to the optimal organiza- 
tion of information at each node. The total information 
cost in going from a source to a target is counted as the 
sum of all the individual node information costs along 
the way. As high-degree nodes cost more to pass than 
low degree ones, we find that the total information cost 
depends crucially on the relative organization of high- 
and low-degree nodes, as well as on modular features of 
the network. 

In practice, the walk from s to t may be more or less 
directed, dependent on the walkers ability to choose exit 
links that lead it closer to the target. If the walker at 
each point along the walk chooses an exit link e that 
leads it closer to the target, it will arrive to the target 
node t after l st steps equal to the shortest path between 
s and t. But if the access to node information is "lim- 



ited" along the way, there are chances to make mistakes: 
The walker has a probability to choose an edge that in- 
creases its distance to the target. The length of the walk 
will then be longer — say, with an amount Al st compared 
with the shortest path l st . The total information cost is 
then determined by two factors: directly by the limited 
node information i and indirectly by the length of the 
path, which increases with decreased node information. 
As the limits on node information i — > the walk will 
be random. The limits on the node information affect 
nodes of high degree more than low-degree ones; hence 
the structure of the underlying network plays an impor- 
tant role in the interplay between typical walk length and 
the limited node information. 

The information measure presented is interesting for 
two reasons: First it captures the information cost payed 
by a "signal" in some real world scenarios — for example, 
a newcomer in a city asking about the way to the hotel 
|{| — where the limited information approach corresponds 
to not asking enough questions and instead to high or low 
extent walk by chance. Second it provides a method to 
characterize and distinguish networks qualitatively and 
quantitatively from each other. 



II. SEARCH INFORMATION 

We first quantify the information cost in number of 
bits I(s — > t) it takes to navigate the shortest path from 
node s to node t. This could in principle be done as in 
Ref. but given that we have to compute I(s — * t) on 
the basis of local choices we compute a node information 
ijt on every node j on a walk leading to target t. That 
is, ijt is the number of bits that one needs on node j in 
order to select one of the exits that leads to t along a 
shortest path. Then, following the walker from s to t we 
compute I{s -»■ t) = £ jepath(st) 

If no degenerate paths exist, as in Fig.^a), then 



ijt 



log 2 fcj, 



(1) 



where kj is the degree (number of links) of node j, since 
the task is to select one link among kj . ijt can also be un- 
derstood as the information loss associated with weight- 
ing all links equally, instead of knowing the unique exit 
path. When there are two or more degenerate paths from 
j to t, the required information depends on the relative 
probabilities that one wants to choose each shortest paths 
with, and Eq. generalizes to 



Ijt = log 2 (fc i ) 



^qjit \og 2 q jit , 



(2) 



where qju is the probability to choose a link to node i 
from node j on a walk to node t qju = 1). qju = if 
the link is not on the shortest path between j and t (or 
if there is no link between j and i) . Equation counts 
the information loss associated with setting all kj links 
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FIG. 1: (Color) Search information 7(s — » t) measures the 
average number of bits one needs in order to walk along one 
of the shortest path from s to t. It can be sampled by walking 
along the shortest paths between s and t: (a) For nodes with 
a unique direction to the target, this direction is selected with 
an information cost log^fcj). (b) For nodes j at branch points 
between degenerate paths, the exit links are selected with 
probability qju (colored after value) proportional to Pjit, the 
probability that a random walker at that exit would reach the 
target along a shortest path. Let, for example, the probability 
to leave node s to the left be q and accordingly 1 — q to take 
the right way (the probability is zero to go downwards in 
the figure since these links are not on a shortest path to i). 
Then, J(s -*•*)(?) = [log 2 4 + glog 29 + (l- 9 )log 2 (l-g)] + 
[q log 2 2+(l— q) log 2 6], where the first set of parentheses is the 
information cost on node s and second set of parentheses the 
cost on the following step, left and right, respectively. We look 
for a q that minimizes {I(s — > t)(q)}, giving q — (1/2) /(1/6 + 
1/2) = 0.75. I(s -s- 1) = 3.0 in (a) and I (a -> i) w 2.6 in (b). 



equal instead of confining them with the selection prob- 
abilities qju- Thus it also counts the information needed 
to confine our choice to the limits imposed by qju, given 
that one has to choose one of kj exit links. For example, 
if all paths are degenerate and chosen with equal weights, 
ijt = 0, whereas two equally weighted degenerate paths 
would contribute with ijt — log 2 (fcj) — log 2 (2). Following 
the line to always choose the method or parametrization 
that represents the lower limit of information we choose 
the probability to leave a node along a link on a shortest 
path between s and t to minimize the total information 
cost I(s — > t): 

In general, if there are many degenerate paths, the 
probability to exit to node e from node j on the shortest 
path to t is 



Qjet 



Pjet 

J2iPji 



(3) 



where 



Pjit 



£ n i 



path(zi) Z£path(jzt) 



Pjit is the probability to walk the shortest path to t from 
node j via the link to node i in an unbiased walk. For 
example, in Fig. ^b), p s u is 1/2 to the left and 1/6 to 
the right. Thus, not all degenerate paths are weighted 
equally: It pays off to use additional information on a 
node with branching paths, in order to avoid paying more 
information later. In this way one suppresses paths which 
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goes through nodes that have high degrees and thus are 
more costly to pass [e.g. right path in Fig.QJb)]. We note 
that the results from the particular choice of q values 
are chosen to minimize the total information cost as we 
want to be consistent with our overall optimal search 
approach. In practice this biased branching gives bits 
of search information which are almost inseparable from 
the search where each correct link is chosen with equal 
probability. 

For each walk, I(s —> t) is the sum of the contributions 
along the shortest path from source s to t. If there are 
degenerate paths, I(s — » t) is calculated by averaging 
over many walks from s to t. In this way we by local 
walks obtain the search information for any pairs of nodes 
s, t, an information that may also be calculated directly 
[6( from knowing all degenerate paths path(st) between 
s and t: 



i 



(4) 



\p(s,t) j£path(si) 



The present definition differs slightly from that of 6] be- 
cause we here are also open to the possibility of returning 
to a node that just was visited [1/kj instead of l/kj — 1]. 
Here we do not exclude a step back, since we want to 
generalize our measure to the case of nonperfect walks 
associated with limited or imperfect node information. 



III. LIMITED SEARCH INFORMATION 

We now turn to the limited information perspective 
and assume that the amount of information at a node 
is limited to i bits (illustrated in Java applet The 
consequence is that the walker increases its probability 
to not follow a shortest path as i decreases. Further, the 
walk between, say, s and t can be substantially longer 
than the actual shortest path. In Fig.^b), i = 0.2 and 
the average walk is 4.5 steps compared to the 2 steps 
in Fig. |2Ia)- To limit ijt to i we blur the q values of 
node j in Eq. by a e Jt 6 [0, oo], through a uniform 
smearing qjn — > qjit + £jt that increases the probability to 
choose any false exit with an equal amount. Normalizing 
the local exit probabilities we obtain the smeared exit 
probability 



Qjit + £jt 
1 + kjCjt 



(5) 



which interpolates between the optimal nonblurred value 
qjit and the random walk value 1 / kj in a simple way. tjt 
is determined to satisfy 



l jt = lo S2( k j) + Z_] Qjit log 2 Qjit < i 



(0) 



where < is only relevant when the unblurred ij t is 
already lower than the limited node information %. 
The effect of limited information on choosing a unique 



correct exit link varies with the degree k of a node. With 
information threshold i — 1 the probability assigned to 
this single correct exit link is 17% if k = 1000, 28% if 
k = 100, 68% if k = 10, and obviously 100% if k < 2. 
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FIG. 2: (Color) Search with limited information: At each 
node j along a path from s towards the target t, the informa- 
tion cost is limited by i bits, or equivalently, only i bits are 
accessible. The color of the links represents qjit, the probabil- 
ity to leave node j along a link to node j on a path towards t. 
With limited information the weighted exit probabilities qju 
are changed according to (qj it + £jt)/(l + kjtjt) — » qjit to sat- 
isfy ijt < i. In (a) there is no upper limit, or ! = oo, and the 
case reduces to the one in Fig. ^b) with Ioo(s 2.6 
bits and the excess walk Al — 0. In (b) i = 0.2 and 
I0.2 (s — *i) w 0.7 bits and the excess walk Al ~ 2.5. 



In order to quantify the information associated with 
walking in different environments (networks) and infor- 
mation limits i we consider the average number of bits 
of information it takes to navigate in the network with 
N nodes, J, = 1/N 2 t I( s ^ thus quantifies the 

navigability or search information of networks as in Ref. 
0, but takes into account the limited node information 
and associated usage of nonshortest paths. 



IV. RESULTS AND DISCUSSION 

Figure |21 shows the effects a limited i has on a number 
of model networks. All networks have 10 4 nodes and are 
two Erdos-Renyi (ER) networks with two different aver- 
age degrees and two scale-free (SF) networks with degree 
distribution P{k) oc l/(fco + fc) 7 parametrized by 7 and 
(k). With this parametrization it is possible to keep the 
same number of links in the two scale-free networks with 
different exponents. The networks are generated by the 
method presented in [l9j , which ensures that they are un- 
correlated and connected. Overall I t is nonmonotonous 
in i. For very high 1 (> 10 in Fig. [3} one reproduces the 
search information of a direct walk. As 1 is decreased to, 
say, i ~ 1, the total search information is decreased by a 
few bits, reflecting the fact that the gain by asking fewer 
questions at each node is slightly larger than the cost 
of going a few steps longer. In fact, the local minimum 
roughly corresponds to walks which typically are Al ~ 10 
steps longer than the direct path [Figs.|2Ia) and^b), a 
length comparable to the diameter of the networks. This 
Al is representative for typical paths, independent of the 
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direct distance I (for I > 2) between source and target, 
as illustrated in the inset of Fig. Ola). 

For even smaller i the rapidly increasing length of the 
walks makes the the total information cost J, increase; 
for some networks it even becomes larger than L l=00 . For 
still smaller i the walk gradually approaches that of a 
random walk and the length of the walk is limited by 
system size. Thus for small enough i the l l is bound to 
decrease to zero, a decline that starts for i decreasing to 
values below ~ 0.1 bit. 
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FIG. 3: (Color online) Search with limited information in 
Erdos-Renyi (ER) networks and scale-free (SF) networks with 
degree distribution P(k) oc l/(fco + fc) 7 parametrized by 7 and 
(k). (a) As the available information at each node decreases 
with decreasing 1, the typical path length for going between 
two nodes increases by Al beyond the length of the shortest 
path between the nodes. The inset demonstrates that Al is 
nearly independent of the length of the shortest path between 
nodes for I > 2. (b) The variation of J, as a function of the 
available node information 1. (c) The typical search informa- 
tion It first decreases and then subsequently increases as the 
walk length increases due to the limits on 1. 

The maximum of I % obtained around 1 ~ 0.1 bit is 
most striking for networks with high average degree and 
especially if they have broad degree distributions. That 
is because the information constraint is strongest on the 
high-degree nodes, where one in principle needs more in- 
formation to navigate correctly. To investigate this fur- 
ther we have examined walks of the type s — > s — i.e. 
walks that start and end in the same node. Roughly in- 
dependent of the investigated network, we found that as 
1 decreases below 1 bit, the walks starts to delocalize and 
are completely delocalized at 1 ~ 0.1. This corresponds 
to the information threshold at which the walk lengths 
depicted in Fig. |2fa) start to saturate and I t reaches a 
maximum. Obviously the value of % where the walkers 
localize increases with the average degree (k). 

The navigability of a network is determined by its 



topology. That is, it depends on both the degree dis- 
tribution and how nodes of various degrees are con- 
nected to each other. We will here focus on comparing 
a given real-world network with its randomized counter- 
parts, defined by rewiring links such that all nodes con- 
serve their degree and such that the network remains 
globally connected ^i). To quantify navigability in the 
presence of limited information we compare the Z scores 

[Z(I t ) = (I, - /random ^random^ where /random ig th{J 

average I t for corresponding randomized networks, with 
standard deviation a r i &nAom . i n Fig. 0] we investigate real 
networks at four different levels of limited information. 

The Internet is the network of autonomous systems 
[l5| that in this data-set consists of 6474 nodes and 12 
572 links and its degree distribution is scale free with 
P(k) oc 1/k 2 - 1 . In the CEO network (6193 nodes and 
43 074 links), chief executive officers are connected by 
links if they sit at the same board The city net- 

work is constructed by mapping 1868 streets to nodes 
and 3026 intersections to links between the nodes in the 
Swedish city Malmo 0- Yeast is the protein interac- 
tion network in Saccharomyces Cerevisia detected by the 
two- hybrid experiment |17|. and fly refers to the similar 
network in Drosophilia melanogaster |18| . Both of these 
networks are pruned to include only interactions of high 
confidence, and in both networks we compare with their 
random counterparts where both bait and prey connec- 
tivity of all proteins are preserved. 

Overall, all networks but the Internet maintain their 
rather bad navigability with decreasing node informa- 
tion. Thus the overall communication features reported 
earlier p| are robust to the limited information and 
searches that go beyond the shortest path. The particu- 
lar result of the Internet means that its randomized ver- 
sions are more difficult to navigate with low information. 

To understand the overall navigability in more detail, 
Fig- El resolves I I into Ii{l), defined as the average search 
information over all nodes separated by a shortest path 
length I. We examine the average information associ- 
ated to walking to a specific node a distance I away in 
the network Q and include three model networks with 
1000 nodes that show distinguishing features. The mod- 
ular network is constructed by 33 highly interconnected 
communities, each node having 6 links to nodes within 
the community and each community having 6 links to 
other communities. The degree hierarchical network is 
constructed so that all shortest paths have the property 
that they first go to nodes with subsequently higher de- 
gree (up in the degree hierarchy) and then to nodes with 
lower and lower degree to the target. In the degree anti- 
hierarchy the networks are constructed to minimize this 
property |l9j . In order to renormalize for effects associ- 
ated with the degree distribution we also here compare 
with the corresponding /™ ndom (/) f or the randomized 
counterparts. 

Let us as an example discuss the Internet, where 
I, - irandom^ ex hibits a minimum for I ~ 2 -> 3 at 
all 1 > 1. This reflects a modular structure associated 
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FIG. 4: (Color online) Overall navigability of real-world net- 
works, compared to their random counterparts presented as Z 
scores, for four levels of i = oo, 4, 2, 1 bits/node together with 
the corresponding average excess path length Al. Al shown 
in the figure is for the real networks, but it also very well 
represents the randomized counterparts. The Internet (hard- 
wired Internet of autonomous systems |l5|P is more sensitive 
to limited information than the similarly sized CEO (chief 
executive officers connected by links if they sit at the same 
board The city network is the Swedish city Malmo with 

streets mapped to nodes and intersections mapped to links Q. 
The two biological betworks are the protein-protein interac- 
tion networks of Saccharomyces Cerevisiae (yeast) and 
Drosophilia melanogaster (fly) ITql . 
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FIG. 5: (Color) Information horizon of (a) 3 real-world net- 
works from Fig. [I] and (b) three model networks of size 
N = 1000; one modular and two scale-free networks with 
degree distribution P(k) oc k~ 2A , organized to be, respec- 
tively, degree hierarchy and degree antihierarchy |19|| . We 
compare information associated to navigation between nodes 
at distance I, with the navigation in randomized counterparts 
(keeping the degree sequence) for i = oo, 2, and 1. 



to country boundaries llj. Walks within the modules 
visit highly connected nodes less frequently than in the 
randomized version where even short paths tend to go 
through the hubs. In contrast, when forcing paths to go 
through highly connected nodes at very short distances, 
as they do in the degree hierarchy [see Fig.EJb)], ~ 2) 
becomes relatively large at short distances. 
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FIG. 6: (Color online) The average degree (k) of nodes as 
function of distance from a random node, in units of what 
it is in a randomized version. The networks are the same as 
studied in Fig. EJa) . A relative high value of (k) is associated 
with the information barrier, as indeed seen by comparing 

{k(£))/{k(£)} random with the - Irandom(£) in Fig-GJa). 



For distances I > 3 the path typically escapes the mod- 
ule and goes through highly connected nodes. The advan- 
tage of modular structures at short distance is turned to 
a disadvantage at long distance, as is also illustrated for 
the modular network in Fig.[5[b). As a consequence, also 
for the Internet, I t=00 — I[=%o° m (l) ^ s positive at large I. 
A possible interpretation is that at these larger distances, 
the country modules are connected through nodes of high 
degree, as reflected by the overabundance of high-degree 
nodes at distances I ~ 4 in Fig. EJ Thus modular struc- 
ture connected by high-degree nodes gives the Internet 
an information horizon at these intermediate distances. 

With information limited to i, the information cost 
at especially the highly connected nodes reduces exten- 
sively. This is especially important for the Internet, 
where the chance to make mistakes on the many large 
hubs increases substantially with decreased i. In fact, 
with decreasing i the search gets cheaper in the Internet, 
but more expensive in its randomized counterparts. This 
is because walks in the randomized version bounce be- 
tween the hubs, which are more interconnected than in 
the real Internet [2(j. This bouncing adds to the total 
information cost by the high cost to pass by hubs. In 
contrast, as i decreases the real Internet in fact increases 
its communication ability because many of the false exits 
lead to nodes of degree 1 where the walker bounces back 
without information cost. Figure |SJb) reveals a similar 
communication topology in the degree antihierarchy. 

For the CEO network, the most striking pattern is 
that limited information walks at short distance are much 
easier in the real, than in its randomized counterparts. 
The walks are quite localized in the CEO network, a di- 
rect consequence of the highly modular structure of the 
fully connected boards. The pattern for the city uncovers 
a modular structure indicated by the high resemblance 
with the modular network in Fig. OJb). The design of 
this city makes navigation at short distance easier than 
in a random city and this feature is even more evident in 
the perspective of the limit information indicated by the 
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stronger horizon as the node information is decreased. 

V. SUMMARY 

The design of network topologies defines the ability to 
direct signals, thus maintaining cooperativity in the cor- 
responding system. In this paper we have investigated 
how the peer-to-peer communication of networks can be 
maintained in view of sending signals with the possibil- 
ity to make erroneous choices along the signaling paths. 
This was done by introducing a walker on the network 
and quantifies how well this walker located a given tar- 
get node, provided more or less correct information on 
directions as the walker moved from node to node in the 
network. Overall we have found that the results for un- 
limited node information presented both in |6j and |j| 
are robust to limited node information and nonshortest 
paths. Thus, the approach to characterize networks with 
shortest paths is a good proxy for characterizing also 
communication where mistakes are allowed. In particular 
we have demonstrated that real- world networks as diverse 
as the Internet, a city network and in fact also molecular 
networks (data not shown) have a structure which can be 
described as favoring communication on short distance at 



the cost of constraining communication on long distance. 
There are two aspects of such communication structure, 
a tendency to modular organization, and a tendency to 
constrain signals to certain channels. The modular net- 
work design is a characteristics of both the studied city 
and the Internet topologies. The feature associated to 
certain communication channels was investigated in Fig. 
where we found a structure with paths that consist of 
sequences of several lowly connected nodes. The hubs 
typically interfere with the walker some length down the 
paths, and at least for the Internet the hubs are associ- 
ated with communication between the modules. 

Finally, and more generally, the fact that one man- 
ages fairly well with small node information in all inves- 
tigated cases, implies that directed navigation in typical 
networks requires remarkably little information on the 
level of individual nodes. 
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